AITopics | vectorial feedback and complex objective

Regret Minimization for Reinforcement Learning with Vectorial Feedback and Complex Objectives

Neural Information Processing SystemsDec-25-2025, 19:33:12 GMT

We consider an agent who is involved in an online Markov decision process, and receives a vector of outcomes every round. The agent aims to simultaneously optimize multiple objectives associated with the multi-dimensional outcomes. Due to state transitions, it is challenging to balance the vectorial outcomes for achieving near-optimality. In particular, contrary to the single objective case, stationary policies are generally sub-optimal. We propose a no-regret algorithm based on the Frank-Wolfe algorithm (Frank and Wolfe 1956), UCRL2 (Jaksch et al. 2010), as well as a crucial and novel gradient threshold procedure. The procedure involves carefully delaying gradient updates, and returns a non-stationary policy that diversifies the outcomes for optimizing the objectives.

regret minimization, reinforcement learning, vectorial feedback and complex objective, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Reviews: Regret Minimization for Reinforcement Learning with Vectorial Feedback and Complex Objectives

Neural Information Processing SystemsJan-26-2025, 03:15:57 GMT

Summary: This paper studies a generalization of online reinforcement learning (in the infinite horizon undiscounted setting with finite state and action space and communicating MDP) where the agent aims at maximizing a certain type of concave function of the rewards (extended to global concave functions in appendix). More precisely, every time an action "a" is played in state "s", the agent receives a vector of rewards V(s,a) (instead of a scalar reward r(s,a)) and tries to maximize a concave function of the empirical average of the vectorial outcomes. This problem is very general and models a wide variety of different settings ranging from multi-objective optimization in MDPs, to maximum entropy exploration and online learning in MDPs with knapsack constraints. In section 2 the authors introduce the necessary background and formalize the notions of "optimal gain" and "regret" in this setting. Defining the "optimal gain" (called the "offline benchmark" in the paper) is not straightforward.

regret minimization, reinforcement learning, vectorial feedback and complex objective, (8 more...)

Neural Information Processing Systems

Genre: Summary/Review (0.76)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.62)

Add feedback

Reviews: Regret Minimization for Reinforcement Learning with Vectorial Feedback and Complex Objectives

Neural Information Processing SystemsJan-26-2025, 03:15:46 GMT

Two out of three reviewers appreciated the contributions of this paper, with one expert reviewer praising almost every aspect of the paper. On the negative side, one reviewer took issue with the proposed setting, highlighting that the utility of the proposed objective function is somewhat dubious in the general context of multi-objective decision making. I agree with this reviewer in that having "multi-objective" in the title of the paper may set the wrong expectations for some readers, and I suggest that the authors consider changing the title of the paper for its final version to avoid such misunderstandings. Furthermore, the final version should discuss the relationship between this paper and the very recent work of Rosenberg and Mansour (2019) that studies essentially the same problem in episodic MDPs. Other than these concerns, the paper is worthy of being published without major changes.

regret minimization, reinforcement learning, vectorial feedback and complex objective, (3 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.40)

Add feedback

Regret Minimization for Reinforcement Learning with Vectorial Feedback and Complex Objectives

Neural Information Processing SystemsOct-10-2024, 15:26:11 GMT

We consider an agent who is involved in an online Markov decision process, and receives a vector of outcomes every round. The agent aims to simultaneously optimize multiple objectives associated with the multi-dimensional outcomes. Due to state transitions, it is challenging to balance the vectorial outcomes for achieving near-optimality. In particular, contrary to the single objective case, stationary policies are generally sub-optimal. We propose a no-regret algorithm based on the Frank-Wolfe algorithm (Frank and Wolfe 1956), UCRL2 (Jaksch et al. 2010), as well as a crucial and novel gradient threshold procedure.

regret minimization, reinforcement learning, vectorial feedback and complex objective, (1 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.40)

Add feedback

Regret Minimization for Reinforcement Learning with Vectorial Feedback and Complex Objectives

Cheung, Wang Chi

Neural Information Processing SystemsMar-18-2020, 20:32:55 GMT

We consider an agent who is involved in an online Markov decision process, and receives a vector of outcomes every round. The agent aims to simultaneously optimize multiple objectives associated with the multi-dimensional outcomes. Due to state transitions, it is challenging to balance the vectorial outcomes for achieving near-optimality. In particular, contrary to the single objective case, stationary policies are generally sub-optimal. We propose a no-regret algorithm based on the Frank-Wolfe algorithm (Frank and Wolfe 1956), UCRL2 (Jaksch et al. 2010), as well as a crucial and novel gradient threshold procedure.

regret minimization, reinforcement learning, vectorial feedback and complex objective, (1 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.40)

Add feedback

Filters

Collaborating Authors

vectorial feedback and complex objective

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Regret Minimization for Reinforcement Learning with Vectorial Feedback and Complex Objectives

Reviews: Regret Minimization for Reinforcement Learning with Vectorial Feedback and Complex Objectives

Reviews: Regret Minimization for Reinforcement Learning with Vectorial Feedback and Complex Objectives

Regret Minimization for Reinforcement Learning with Vectorial Feedback and Complex Objectives

Regret Minimization for Reinforcement Learning with Vectorial Feedback and Complex Objectives